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© Quantitation of chromatographic information. 



C> A process and apparatus are disclosed for quantitating chromatographic information. The method uses a 
aiscrete, linear, translation invariant filter function a ~ . where N is a measure of the filter function width and a is 
a parameter whose value determines signal to noise characteristics of the filter function. First a chromatographic 
analysis of a sample is performed to obtain a first chromatogram. Then the first chrcmatogram is filtered with the 
filter function, with N set to approximate the width of peaks obtained in the first chromatogram. and a is set to 
filter out high frequency noise from the first cnromatogram to obtain a second chromatogram having a first 
filtered baseline. The second cnromatogram is then filtered with a set to resolution enhance oeaks in the second 
chromatogram to obtain a third chromatogram having a baseline which is substantially the same as the first 
filtered baseline. The second chromatogram is then subtracted from the third chromatogram to obtain a fourth 
chromatogram which is baseline corrected. 
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QUANTITATION OF CHROMATOGRAPHIC INFORMATION 



Background of the Invention 

Th.s invent.on relates to a method and apparatus of general utility for analyzing chromatoaranh.r 

:l"T« T Ulaf ' y 10 ° f SUCh f ° r ana,yZin9 <*~graph 1C .nformat.o'n pe^nT^e 

automatic determination of peptide sequences. "»»""y iu ine 

The chemical process employed by protem/peptide sequencers is derived from a technique originated 
T 2 2 .SOT." 1 T» S ,0r ? e Se r nt ' al ° e9radati0n ° f PePtide Ch3inS (Ea ™- Acta 'he's and 
coupling of a peptide s am.no-term.nal am.no acid with the Edman reagent, phenylisothiocyanate (PITC) a 
reaction catalyzed by an organic base delivered with the coupling reagent. The second step , s cleavage of 
this denvatized am.no acid from the remainder of the peptide, a reaction effected by treating the peptide 
w,th a strong organic acid. Each repeated coupling/cleavage cycle occurs at the newly-formed am.no- 
terminal an.no ac.d left by the previous cycle. Thus, repetitive cycles provide sequential separation of the 
amino acids which form the primary structure of the peptide. ~" """" 

The sequencing process is not completed by the Edman degradation alone. Once the ammo acids are 
removed from the sample, they must be analyzed to determine their identity. Since the cleaved amino acid 
T£ £ , ^ h an " i " 0thia20lin °" e ( ATZ >- * not generally suitable for analysis, it is converted to the more 
r k m 7 V l <PTH) f ° rm bef ° re anaiySiS is attem Pted- In modern sequencers (Wittmann- 

Uebold et a Anar Bjo<*em 75. 621, 1976; Hewick et al J. Biol. Chenv 256, 7990, 1981), this conversion is 

Th! T?t S ^L aU:0 T atlCally 3 reaCt '° n V6SSel Separate from that in which Kidman degradation occurs 
ITnl ,Z 31 e3Ch de 9 radation c V cle is extracted fro™ »» peptide with an organic solvent.' 

transferred to the reaction vessel and treated with an aqueous solution of a strong organic acd to effect 
converse to the PTH. The PTHs produced from each degradation cycle may be transferred to fractfon 
collector vials until several are manually collected and prepared for analysis. Alternatively, the PTHs may be 
i feC, H and automatical| y from ^ sequencer conversion vessel to an on-line analysis system 
(Mach.e.dt W and Hoffner, H.. in Methods in Peptide and Protein Sequence Analysis , pp 35-47 Birr ed 
Elsever (i960); Wttman-Liebold and Ashman, ,n Modern n^odT TTProteln-Ch^^ pp ' 303-327 
Tschesche. ed..de Gruyter (1985); Rodriguez. J. Chromotoqraphv 350. D p2i75?s (19 85)) 
thP ° f ana ' ytiCal procedures have been used to identify the ammo acids released during 

use in Hpfn ' ° * ^ performance chromatography (HPLC) is currently in widespread 

use. In tact. HPLC on reverse phase, silica-based packings has revolutionized peptide sequencing It 

used no, nHzlT^lT quantita,ive ana| y s,s of PTH af ™° and is presently the only technique 

Mnr! I? V ^ C3n re " ably reS °' Ve *" 0f the PTH am,no ac,ds a single chromatograph run 

suitor m^Z ' l Pr0V ' de . S qUant ' tatiVe d3,a 31 P ' C0mO,e ' evel ' HPLC ,s ,he on| V analytical met od 
suitable for m.crosequencing by automated Edman sequencers at the present time 

<o a, PTHs Z ?n P matter th ' S IS n6Ver the C3Se: 6aCh chroma '09ram contains some amount of 

sequence a^lL.T T ,T °° °' ^ r6 ' atiVe am ° UntS mUSt be made in order to the 
sequence assignment. Several factors g,ve rise to this problem. Rrst. protein or peptide samples are 

*i : Ure ' ^ a,WayS c C ° ntaln S °™ ,evel <* peptides or free ammo acid's thatgTve "e o 
eZ^ T 9 SequenC,n 9 Second - re P eated ex P°sure of the sample to the cleavage acd during the 
,s expo! rt y t CaUSeS SPm " 9 ° f P6Ptide Chai " 31 Slt6S 0ther ,han -"ino terminus. The new y 
™ n , reSU '" n9 ,r ° m th6Se internal sp,its ,hen P rod "ce PTHs after subsequent 

Z P ^X7,Ze C sT- AS ,H a reSU "' tYPe °' am ' n ° 3C,d 9enera, ' y eXh ' bitS 3 ^cKground PTnTeve 

zzz o^s:: ^^rv^:^ 9 . ,hrou9h ,be ear,y cycies and ^ 



- ammo aac mat snouic nave oeen reieasea a: mat cvcie wn, r^m^n tr, ^ „ .^V _ . 

w,? t^77^ h ' S Carry ° Ver - " ^ ' £ CUmU ' at,Ve ' mU,t ' P,e ^ - a "V sTng^ pe^eTole'c:;: 
w.ll result m a steadily .ncreasmg proportion of a population of molecules be.ng out of pnase with the 
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mternai cnam oeavage. ana lag. This decrease in signai, meastrea as tne reDetitive cycle yielc. occurs 
simultaneously with the increase in noise (due to the factors deserved above). maKing correct amino acio 
assignment ever more difficult as a sequencing run proceeds further into the peptide. Fifth, tne reiauve 
recoveries of the PTH ammo acids from the Edman chemistry vary. Some are recovered almost cuanma- 
5 lively, wnne others are largely aestroyea oefore analyses. 

Despite these prcbiems, rigorous interpretation of the chromatograohic oata from a sequencer run m 
terms of an amino acid sequence has not received as much attention as the cnemistrv and instrumentation 
employed Many, perhaps most, sequences are assigned by visual inspection or cfircmatograms to 
distinguish the specific increase in the PTH level of one ammo acid at each cycle from the general 
jo oackground level of all the PTHs. This method is remarKably simple and effective, but it aoes nave 
limitations. It relies on the scientist's partem recognition abilities, skills that are largely subjective and limited 
to direct comparison of only two to three chromatograms at any one time. 

Because of these limitations, an increasing number of scientists are using HPLC peak integration 
systems to translate the analog signals displayed on chart recorder traces into a simple' set of "digital 
>s numbers. This allows the recovery of each PTH at each cycle to be plotted on a graph that more clearly 
snows me specific sequence signals superimposed on the background noise levels. Smithies et ai.. see 
Biochemistry 10. 4912. (1971), were the first to oefine the mathematics of the sequencing chemistry in 
terms of initial yield, repetitive yield, lag. and amino acid backgrouno and to attempt quantitative sequence 
analysis based on peak integration. Machleidt. W. and Hofner, F., (1981), in Htqh Performance Chromatnn- 
:o rapny ri Protein & Meptiae Chemistry , pp 245-258, Walter de Gruyter, Berlin, have also contributed to this 
process. Cut all of the previous methods have relied on the subjective grading of the integrated peak values 
by the skilled scientist performing the sequence analysis. The scientist's subjective interpretation of the 
relative importance of an elevated level of one ammo acid versus another at any given cycle has still been 
required tor the final sequence assignment. 
2b in addition to ail of the above difficulties having to do with background PTH levels, cumulative lag. side 

reactions, etc.. other important problems are associated with the chromatographic data itself. While most 
chromatography software available commercially works well with ideal data (i.e. with large, well-resolved 
peaks), iney perform much less well with real world data. With respect to analyses of amino acid 
dwrivatwus. sjch non-ideal data is the rule rather than the exception. Generally, amino acid analyses involve 
30 separations of a complex mixture of ciosety-related compounds, frequently at such minute levels that 
conven:ionai software fails to provide satisfactory results unless the user provides extensive manual input to 
correct the deficiencies in the software. 

in concept. HPLC data systems collect chromatographic data by periodically sampling the output of the 
HPLC detector and then process this digitized data. Quantitation is men performed using peak integration. 
35 which requires locating the start and end points of a peak, measuring the total signal between these points, 
and suotacting any background signal. The center position of the peak (i.e.. its retention time) is also 
required to identify it as a known component based on retention times obtained with standards. Then, the 
measured area of a sample peak can be converted to a molar amount based on the measured area of the 
corresponding standard. This conceptually simple process is. however, complicated by several factors, e.g. 
40 such as chromatograpnic noise, peak overlap, and retention time drift. 

The chromatographic noise arises from the detector electronics, incomplete mixing of solvents ounng 
gradient chromatography, passage of gas bubbles or particulates through the detector, refractive index 
cnanges due to solvent cr temperature gradients, and the elution of solvent or column contaminants. At the 
present time conventional HPLC systems deal rather imperfectly with both tow and high frequency 
*J5 enromatographic noise. Most high frequency filtering relies on hardware implementations ana is performed 
by analog filters built into the oetector circuitry, and some HPLC systems attempt to remove low frequency 
noise (often called baseline drift) by using point-by-potnt subtraction of a blank chromatogram from the 
sample data. This fatter teenmque is particularly troublesome since it introduces additional nign frequency 
noise and because baselines can vary substantially from run to run. New and less cumbersome techniques 
50 are clearly needed for the reduction of chromatographic noise. 

Peak overlap, i.e. incompletely resolved peaks are particularly troublesome to HPLC software. Small 
peaks tnaf parti a II v overlap 'arc^r nnp<^ ^av be mi^sp^ K '/ +h <^ cip- r . - • - . ri ~- .. ,. . „ . .... 



: Au: ' u c ' cc irorr ne vaiiey nerweer *ne peaKS to tne oasenne. (n) a similar extrapolation to 

set the baseline of a maior component with a tangent sk:m to set the baseline of a minor component, ano 
(Hi) linear extrapolations between the beginning and end points of eacr separate component. The method 
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which gives the most accurate peak measurements depencs on ooth the degree of resolution betweeen 
the peaks and the relative peak heights. It is, therefore, highly sample dependent and frequently requires 
user adjustment from one sampie to another in a set of chromatograms in which these parameters are not 
constant. 

5 Retention time drift is also a particular problem since once peans have been located and cuantitated. 

they must be identified by matching their retention times to those of known standards. This is simole if the 
variation in retention times from run-to-run is always less than the time separation between closely efutmg 
peaks within a run. Typically, software routines are set to search an eiution time "window" centerea on the 
standard eiution time to find the best match of an unknown peak with the standarc. With complex 

to separations that produce closely spaced peaks, this does not always work since eiution time drift may move 
one peak outside its window and place another in it. This problem can be minimized, however, by using 
easily identified reference peaks to measure the drift and empirically correct the search winoows for other 
peaks. The reference peaks must be we 1 1 -separated from any neighboring peaks and present in all 
chromatograms so their search windows can be large enough to allow for the maximum observable drift. 

;s What is needed is a method that resolves most of these problems with chromatogram quantitation and 
which can be used by a computer to evaluate the set of HPLC data derived from a peptide sequencing run 
to automatically arrive at an unequivocal call of the sequences, without having to rely on the subjective 
interpretations of especially skilled individuals. 

20 

Summary of the Invention 



In accordance with preferred embodiments of the invention, an apparatus and a process are disclosed 

25 for unequivocally determining the sequence of a peptide. According to the method, the peptide to be 
sequenced is degraded cyciicly, arriving at a set of amino acid residues for each cycle. The amount of each 
amino acid residue is quantitatively measured in each set, then a background level is fit to each cycle to 
obtain a background fit. A measure of dispersion is then calculated for the background fit, and the 
measured amounts of amino acid residues in each cycle are normalized relative to the background fit. The 

30 largest normalized background-corrected residue amount in each cycle then provides a sequence assign- 
ment that can be used for further correction steps if desired. These further steps include correcting at least 
some of the normalized background-corrected residue amounts for lag into subsequent cycles, thereby 
obtaining lag-corrected background-corrected residue amounts for each cycle. These lag-corrected 
background-corrected residue amounts can then be used to correct the original measurements of the amino 

35 acid residues in each cycle for differences in injection quantity between cycles, thereby obtaining an 
tnjection-corrected residue amount for each cycle. The previous steps of background correction and lag 
correction are then performed on the injection-corrected residue amounts, and the largest such amount is 
determined for each cycle to arrive at a definitive sequence assignment for the peptide. 

In the preferred mode, the initial step of measuring the amount of amino acid residue in each cycle 

40 usually involves several sub-steps. In particular, as the peptide is degraded cyciicly using an Edman 
degradation scheme, a chromatographic analysis is performed for each cycle, the raw data providing a 
determination of the amount of amino acid of each kind found at each cycle. In one embodiment, the results 
of that determination are then subjected to low pass filtering and high pass filtering to remove measurement 
noise resulting in a baseline-corrected chromatogram. This baseline-corrected chromatogram is then used 

45 in the background-correction process. 

In another embodiment, the baseline-corrected chromatogram is obtained using a linear, translation- 
invariant, discrete filter function , where N is a measure of the filter width anc a determines the 
resolution enhancement characteristics of the filter function. In that embodiment, the chromatogram is first 
smoothed with the filter function to suppress high frequency noise thereby providing a second (filtered) 

so chromatogram with a first filtered baseline. Then the second chromatogram is filtered with the parameter a 
set for peak resolution enhancement to obtain a third chromatogram having substantially the same filtered 



rv- aoDaraius -c carrvinc ou: rnp -on oc CT m c . r , yCnt;c - M - !Cl uutJb a uegraaation element tor 
degrading the peptide cyciicly to obtain the set of ammo acid residues for each cycle, a quantitation 
element for measuring the amount of each ammo acid resiaue in each set, and a computer element for 
controlling the degradation element and the quantitation element. The computer element is also for fitting a 
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backgrojnd ievei to each cycle for calculating a measure of dispersion for the background fit relative to the 
measured amounts of residues in each set, for normalizing the measurec amount of each resicue relative to 
the aispersion to ootain normalized background-corrected residue amounts for each cycle, anc tor 
identifying the largest normalized background-corrected residue amount in each cvcte. 

Brief Description of the Drawings 



>o Fig. 1 A is a flow chart of a metnod in accordance with the preferred embodiment of tne invenuor. 

Fig. IB is a diagram of the apparatus of the invention. 

Figs. 2A and 2B show the steps of the method for quantitating HPLC peaks. 

Figs. 2C and 2D snow the steps of the method for quantitating HPLC peaks according to a second 
embodiment cf the invention. 

Fig. 3A(1) shows a raw chromatogram of a standard PTH mix. 

Fig. 3A(2) shows a baseline-corrected chromatogram performed according to a method of tne 
invention for the standard mix having the raw chromatogram of Fig. 3A(1). 

Fig 3BM) shows two components of a house filter according to the method of the invention. 

Fg 3Bf2) shows the house filter formed by adding the two components of Fig. 3B(1). 
— F»gr 3C-3! show the results cf applying a filtering method according to the invention to tne standard 

P*H mm of Fig. 3A(1). 

F»gs. 3C . 3D . and 31 show the results illustrated in Figs. 3C, 3D, and 31, but on an expandeo scale. 
F»gc 3J-3M show the results of applying the filtering method according to the invention to a second 
stance PTH mix. 

*s F»g 4 A and 4B are a flow chart showing the method used for correction of background levels in the 

amount of each amino acid residue measured for each cycle. 

Ftgs 5A-5D are a flow chart of the method used to correct for lag of each amino acid residue into 
sut i s~T_en: cycles. 

Fig. 6 is a flow chart illustrating the method used to correct for differences in injection amounts for 
:o riiter-:*nT cycles 

F*qs 7A-7C show the results of the method of the invention at different stages of the correction 
rwess for the 51st cycle in the degradation of a protein. Fig. 7A shows the injection-corrected raw data. 
Fig shows the results of the method after performing background correction on the injection-corrected 
raw rjata of Fig. 7A. Fig. 7C shows the results of performing a lag correction afte' performing the 

j5 background correction illustrated in Fig. 7B. 

Figs. 8A-8C show the results of the method of the invention at different stages of the correction 
process for glutamic acic at different cycles in the degradation of a protein. Fig. 8A shows the raw data 
ftrjection-corrected). Fig. 8B shows the results of the method after performing background correction on the 
infection-corrected raw data of Fig. 8A. Fig. 8C shows the results of performing a lag correction after 

40 perto-mmg the background correction illustrated in Fig. 8B. 



Detailed Description of the Preferred Embodiments 



in acccrdance with preferred embodiments of the invention, illustrated in Fig. 1A is a flow chart showing 
a general methed for determining tne sequence of a peptide chain. In this metnod, the various steps 
comprise both direct physical measurements and computational elements basec on those measurements. 
Since the method can oe implemented entirely automatically, with the various physical measurements being 
so performed by instruments under computer control, the language of computer programming lends itself to a 
description of the invention. Hence, to be relatively consistent with that language, hereinafter each step of 
the methoc will be referred to either as a "program element" or as a "subroutine" rather than as a "stec" 



■ -Oram n- '"^"'^D 11 ; set :. f r.rc^c ■,vr,;cr 'na^e _j u a ^umpieie rasK. oe u a cnvsiCcj measurement o- a 
computation, our wmcn is not m itself necessarily a single suoroutine. 

The physical apparatus used to implement the metnod is illustrated m Fig. 1B and includes a 
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sequencing system 51 for performing sequential Edman aegracations, a PTH analyzer 53 wnicn provides 
on-line PTH analysis using HPLC for the samples provided by the seauencing system, and a comDuter 
module 55, which acts both as a system controller and as the analysis device for tne entire system for 
performing the various steps used to arrive at a quantitative call ;or the peptide sequence. As a system 

s controller the computer module is responsible for timing and incrementing degradation events in the 
sequencing system, for transferring materials to the PTH analyzer, and for controlling tne FTH identification 
process. As an analysis device, the computer module stores the data provided by the PTH analyzer, and 
performs the various steps described below to eliminate noise and systematic errors from the raw PTH 
values measured by the PTH analyzer in order to identify the largest relative PTH vaiue in each cycle and 

to thereby determine the peptide sequence, in the preferred moae. the computer module 55 is an Apolied 
Biosystems Model 900A which includes a 16/8-bit microprocessor such as the Intel 8088. a separate math 
co-processor, 640 kilobytes of RAM, a 10-megaoyte hard disk drive, a 360-kilobyte floppy disk drive, a 
touchscreen CRT, and a graphics printer. Also in the preferred mode, the sequencing system 51 is an 
Applied Biosystems Model 470A Sequencer and the PTH analyzer 53 is an Applied Biosystems Model 

75 120A. 

To begin the method, the peptide sequence to be analyzed is first subjected to an automated Eoman 
degradation cycle at program element 11, and HPLC is performed by the PTH Analyzer at program element 
12 to identify the amino acid released. For that Edman cycle, the chromatogram is collected in digitized 
form by the computer module, and the values of the chromatogram are stored. At program eiemeni 13, the 

20 cycle number of the degradation just performed is incremented and the process is repeated, until the 
degradation and chromatograms are performed for all amino acids in the peptide chain, each amino acid 
corresponding to one degradation cycle. At subroutine 15, each chromatogram is identified and quantnated 
to determine the amount of each PTH for each cycle of the degradation. 

Once the identification and quantitation is complete, a first pass at PTH background noise removal is 

25 performed in subroutine 17, and a preliminary sequence assignment is made. This is followed by a lag 
correction in subroutine 19. The lag correction is made to account for the fact that during the Edman 
degradation, the removal of the amino acid residue is only partial, so that a fraction of the amino acid 
appears at subsequent cycles. 

Once the lag correction is performed, the remaining PTH values for each cycle can be used to correct 

30 for any variation in the amount of sample injected into the PTH analyzer at each cycle, if desired. This 
injection correction is performed in subroutine 23. Upon completion of the injection correction, the 
background and lag corrections are repeated and the amino acid assignments are made in program 
element 25 based on the largest corrected PTH signal at each cycle. The second pass through the 
background correction and the PTH lag correction can be accomplished in any number of ways. The 

35 approach shown in Fig. 1A is to set up a program counter to test at program element 21 to determine if the 
PTH injection correction has been made, and, if it has been made, to go ahead with sequence selection at 
program element 25. If it has not been made, the background and lag corrections are repeated. Any of 
several equivalent program counters can be used to keep track of whether the PTH injection correction has 
been made. For example, for program element 21 , one could use a counter for the background correction 

40 subroutine 17, for the lag correction subroutine 19, or for the injection correction subroutine 23 itself. 
Another less practical, but equivalent approach wnicn is not illustrated is to avoid using a program counter 
and a program loop at all, and to simply repeat the background correction subroutine 17 and the PTH lag 
correction subroutine 19 after the injection correction subroutine 23, before making the sequence selection. 
Each of the various subroutines and program elements will now be discussed in more detail by referring 

45 to the various figures enumerated above in the Brief Description of the Drawings. In addition, further specific 
details for program elements 17-25 can be found in Appendix I which provides a specific example of a 
preferred source code for carrying out those program elements. 



so Quantitation of PTH Yields: 



juanmatinn HPLC ceaK.s wnen the amojn: or oeDuGe jcmK i«uutM„eu is reianve y .arge ,i.e when the 
HPLC signa!-to-noise levels are high), modern Ednan sequencers can work with samDle levels that severely 
test these routines. HPLC peaks can be obscured by both high frequency noise (typically from the UV 
absorbance detectors used to monitor PTH elution from the analytical columns) and low frequency noise 
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(typically fronr both temperaiure-mducec detector drift ana aDSorDance'refractive mcex cnanges mat cccur 
during gradient HPLC eluticm. Hence. peaK analysis snoulc be preceded by cnromatogram filtering tc 
minimize the effects of both types of noise. 

According tc the first embodiment, after each chromatogran is coliectec in digitized form by the 

5 computer module 55, ;ow frequency noise is filtered from it by an aaaotation of the metnod cescnoed by 
Goehner, Anal. Chem 50 1223 (1973). The entire chromatogram is first piecewise fined to a oolvncmia! 
curve of degree N. Then all points that are more than selected amounts (in standard deviation units; above 
or below the calculated curve are rejected, and a new fit is made usng the remainnc points. This 
procedure continues until the standard deviation between actual and calculated pants reaches a set 

70 minimum level (i.e. the fittec curve has converged on the baseline of the chromatogram > The oolynomial 
coefficients ot the calculated curve are tnen used to calculate a baseline point for each point m :he ongiral 
chromatogram; the difference between the two sets of values represents the baseline-corrected chromatog- 
ram. 

In practice, the number of original chromatogram points and the number of baseline slope cnanges 

75 (which defines the degree N of the polynomial used for the fit) generally exceeds the practical computing 
power of microcomputers. Thus, the routine is performed sequentially on overlapping segments of the 
chromatogram, with each round of the fit routine used to establish the background for a portion of the total 
chromatogram. Typically, the first 3/9 of the chromatogram points are fit first and used to set the first 5/18 
of the baseline points. Then, 3/9 of the points starting at the 2'9 position from the front enc of the 

20 cnrumaiuyram are fit and used to set tne next 4/1 B ot tne baseline points. Next. 3/9 of the points starting at 
the 4 9 position from the chromatogram front are fit and used to set the next 4-'i8 of the baseline points. 
Finally, the last 3/9 of the points are fit and used to set the last 5/1 8 of the baseline points. Since, where 
possible, only the middle section of each calculated curve is used, discontinuities m the baseline-corrected 
chromatogram where the individual fitted curves meet are minimal. 

25 This baseline calculation process of the first embodiment is illustrated in detail in Figs. 2A and 2B. 

which begins at program element 211, where all program counters and constants are initialized. In 
particular, at least three counters are used, say "i". "k", and "m". A counter i is used to index the particular 
portion of the chromatogram being fit, a counter k is required to identify the particular Edman cycle being 
fit. and a counter m is used to keep track of the iteration number of the fit. Similarly, it is necessary to pick 

jo the degree N of the polynomial to be used to fit the chromatograms. and to decide on the closeness of fit 
desired for the background calculation. As a practical matter, N is generally chosen to be about 6, and the 
closeness of fit is generally chosen to be when the standard deviation of the fitted curve to the background 
points is less than or equal to a constant K {typically twice the high frequency noise level of the detector 
output). Once the constants and counters are initialized, the method continues at program element 213 by 

35 fitting the i-th portion of a measured chromatogram C (t) corresponding the the k-th Edman cycle. On the 
first pass, i = k= i and a polynomial PV (t.N) (i.e. P\' (t.N) on the first iteration) of degree IN is used to fit CJ 
(t) typically using a least squares aDproach. At program element 215, the standard deviation a*, (C) is 
calculated for the fit, and a function 5 (j,? ) is defined whicn corresponds to the maximum pomt-by-point 
deviation allowed. Here 5<o,? ) is chosen to De 0.1 o,? . The magnitude of the standard deviation o,J is tested 

40 at program element 217 to see if it is less than or equal to the chosen constant K. If it is less than K, the 
counter i is incremented at program element 223. The counter i is then tested to see if it is greater than 4 at 
program element 225. If it is not greater than 4, the next portion of the chromatogram of the k-th cycle is fit 
m program element 213, etc. 

If at program element 217. the magnitude of the standard deviation is larger than K the program 

45 increments m. the iteration counter, at program element 219. Then at program element 221. a new function 
CA (t) (m this first pass CY {{)) is calculated, hereinafter called the reduced chromatogram, by removing all 
points from the chromatogram for which 



.v'rv-HMc;. rJ - : :r- ♦ - - mroun- rn^ \r 'H™ e'ements 2: 'J tr vc-jg' 22b -jr.: - "\ ;ccjccg ~ — ate c, <;!,,;- 
ror ai portions ct the chromatogram of the k-th cycle are fit acccrding to the fit criterion estaDlished. 

Once tne polynomial fits are completed for each portion i of the chromatogram of the k-th cycie. the 
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baseline of the k-th cycle is calculated at program element 227, the values of the fitting ooiynomiats Pf l (t) 
betng used to calculate the baseline points o k it) for each point in the measured chromatogram C~ (t), where 
i = max.m (the last iteration) for the i-th portion. A background-corrected chromatogram (t) is then 
calculated at program element 229 by subtracting the calculated baseline points b * (t; from the measured 

s chromatogram C« (t). which corresponds to having removed the low-frequency background components 
from the chromatogram. Those skilled in the art will understand that the above aporoacn fcr filtering out low 
frequency noise in this embodiment is but one of many equivalent approaches. For example, any complete 
set of funcuons could be used for the fitting function ratner than a polynomial. 

Once these low frequency components have been stripped from the chromatogram. high freauency 

to notsc is removed at program element 231 using a standard fast Fourier transform filter as is known in tne 
art. Then peaks in the filtered chromatogram are detected and ouantitated at program element 232. Several 
approaches can be used. For example, time windows set on the basis of the observed elution times of each 
PTH in a standard mixture can be used to search the filtered chromatogram C (t) to determine the amount 
of oach PTH represented in the chromatogram. The amounts can be determined by standard first derivative 

15 peak finding and peak integration procedures known in the art, or by fitting the points in each window to a 
Goucsion curve and calculating the area under the curve (see Kent et af. in Biotechntques 5. pp 314-321 
(1978) or. more simply, the maximum point value in each window can be taken as the peak height for the 
corcsponong PTH. The latter approach is faster but requires good separation of the PTHs by the HPLC 

20 Performing the peak location and Quantitation yields a transformed chromatogram PTH (k) for each 
c>c*e k. *mich corresponds to tne amount of PTH of amino acid of kind j in the cycle k. The program then 
tests tor me cycle number at program element 233. and if all the cycles required to degrade the peptide in 
th-? somp»e are not completed, the cycle number k is incremented at program element 235 and the 
background-correction process begins again at program element 213 for a new cycle. 

25 Figs 3A(1) and 3A(2) show the results of using the technique illustrated in Figs. 2A and 2B for this first 

embodiment to remove low frequency noise. In Fig 3A(1). a raw chromatogram CZ (t) is shown with its 
many n-»aks as a function of time for a 5-pmol PTH standard. The fitted baseline b k (t) is shown as a 
relatively smooth dark solid line at the bottom of the curve C* (t). Fig. 3B(2) shows C* (t), the baseline- 
ccn-Firted chromatogram of tne sample as determined according to the method of the invention at program 

30 element 229 

As an alternative preferred embodiment, another approacn can be used to quantitate the chromatogram 
which uttlues the techniques of digital filtering. Although, digital filtering is generally well known for 
smoothing and resolution emhancement of noisy spectra, and has been applied specifically to the physical 
measurments obtained by ENDOR (electron nuclear double resonance), it has not apparently been applied 

35 to chromatogram quantitation. (See "Variable Filter for Digital Smoothing and Resolution Enhancement of 
Noisy Spectra, " by Bromba et al. Anal. Chem. (1984). 56,2052-2058. and "Properties of a Variable Digital 
Filter for Smoothing and Resolution Enhancement", by Biermann et aJ. Anal. Chem. (1986). 58.536-539. for 
a general discussion of digital filtering in noise reduction). The particular approach disclosed in these 
references pertains to the use of digital filtering which is a discrete, linear, translation-invariant convolution 

•w operation defined by: 

N 

Af (k) = ^ a (n) f (k-n) 
45 n = -N 



where A denotes the filter operator. a(n) is the filter function (kernel), f is the unfiltered spectrum, and Af tne 
filtered spectrum. In particular, the filter function is a vertically shifted triangular filter where 

a(n) = f 2oC +1 ) - 2 c\ ] n\ with n <N 



'-jr calculation purposes \r\t ti Iter rur-rtior is typicany separatee into ;wc parts 
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a (n) = al (n) + a2 (n) 

where al (n) = (N + 1 - / nf ) 2 c< 

N (N+l) 

a triangular filter ana 

a2 (n) = - 2 <K (N+l) -N 
N(2N+1) 

which is a rectangular filter. Hereinafter this combination will be called a "house" filter, because filter 
functions a1(n) and a2(n) when superposed on a graph have the appearance of a house. (See Fig. 3B(1) for 
an example of a triangular fi Iter superposed on a rectangular filter, using a =1. N=9. Rg 3B(2) shows the 
resulting house filter.) 

In this filter function, N is the filter width and a determines the degree of resolution enhancement. The 
basis for resolution enhancement with this filter is a reduction in line width since the frequency response 
exceeds 1 for a > 1 at low frequencies. At higher frequencies the frequency response decreases rapidly and 
ensures that the high frequency noise in the signal is suppressed. The properties of the filter vary, of 
course, depending on a and N. Generally, a = 1/2 is considered to be particularly well suited for signal to 
noise enhancement of unknown spectrometry functions and approximates a matched filter and also 
produces the best high frequency attenuation. Generally, a = 1 is the largest value which enables 
frequency responses not in excess of 1, so that a = 1 marks the boundary between smoothing filters and 
resolution enhancement filters. For a > 1 the frequency response of the filter increases beyond the 
frequency f = 0. and then falls off rapidly providing general resolution enhancement. For a 3>1 resolution 
enhancement increases monotonically with a, but so does noise amplification. 

Although these digital filtering techniques could be applied directly to the chromatogram Quantitation 
problem, the result would still contain residual low frequency noise, due to systematic errors inherent in the 
chromatographic technique at the time of measurement. To avoid that proolem. an alternative approach is 
useo which constitutes a second embodiment of the invention. This second approach is illustated in the flow 
chan of Figs. 2C and 2D. 

As for the previous embodiment, the calculation begins by initializing constants and counters this time 
at program element 238. In particular the counter k corresponding to the Edman cycle number is set to 1 . 
so that the calculation can proceed cycle by cycle. At program element 240. the filter parameter is 
generally set equal to a number in the interval between 1,2 and 1 , and in the preferred mode is set equal to 
1. The filter width is set to match the chromatogram peak width N1, so that in the preferred mode, the fitter 
function approximates a Savitzky-Goiay filter, corresponding to smoothing without resolution enhancement. 
At program element 242. the measured chromatogram for cycle k. (t). is filtered by convolving it with the 
house filter kernel. This generates a smoothed chromatogram CJ, (t). i.e. high frequency noise has been 
filtered out. Next, a is set eaual to a number greater than 1 for resolution enhancement at program element 
244, which in the preferred mode in tms case has been set equal to 20. At program element 246. the 
smoothed chromatogram C- (t) is convolved with the new house kernel to yield an enhancea smoothed 
chromatogram Ct : (t). This chromatogram typically has substantially the same Daseline as the smoothed 
chromatogram C,,(t) but with ennanced peaks and negative sidelobes. since the house kernel is area 
preserving. As a result. 0y subtracting the smoothed chromatogram Cf, (t) from the enhanced smoothed 
chromatogram C 2 (t) at program element 248. one obtains a chromatogram CVj (t) which preserves only the 
peak and siaelobe data. Furthermore, the low frequency baseline noise has been completely removed, 
thereby eliminating an important source of error in quantiTating the chromatograms. Since the peaks are the 
only information of interest, the sidelobes are chopped off at program element 250, to yield a new 



T nen ca.cuidteu jbifiLj tnii new nem*,- at or on ram HinmRnT ati;^- fc?s$e n t!S!! w 'us* smoothes *nr 

cnopped chromatogram (t; At program element 256. the peaks of chromatogram C, (I) are detected 
and quanttatec. and at program element 258. these peaks are comoarea with known reference peaks for 
different ammo acids, m order to correlate each of the peaks in the particular Edman cycle with particular 
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ammo acids, thereby obtaining the quantitated values PTH JO (k). At program element 260, the cycle number 
K is testec tc see if it is greater than or equal to the number of amino acids m the peptice. !f it is. tne 
program continues to subroutine 17. If not, the cycle number is incremented at program element 26*!, and 
the chromatogram quantitation begins again at program element 240 tor the new cycle. The reierence 

5 peaks used for comparison above are obtained by running chromatograms on pure samples of each ammo 
acid and running the filtering routine for each of them tc obtain the filtered rererence chromatograms tor 
comparison with the sample chromatograms. 

Figs. 3C-3H show the results of applying the above smoothing and baseline subtraction method to tne 
standard PTH mix having the raw chromatogram of Fig. 3A(1). Fig. 3C shows the raw chromatogram of Fig. 

w 3A(1) on a reduced scale. Figure 3D shows the results of filtering the chromatogram of Fig. 3C using a 
house filter with N = 3 T a = 1 , to remove high frequency noise. Fig. 3E shows the results of applying tne 
house filter again (i.e. applying it to the filtered cnromatogram of Fig. 3D), this time with N = 3. a =20. Fig. 
3F shows the results of subtracting the chromatogram of Rg. 3D from the chromatogram of Fig. 3E. Fig. 3G 
shows the chromatogram of Rg. 3F after chopping at zero. Fig. 3H shows the chromatogram of Fig. 3G 

is after smoothing with the house filter with N = 2, and a =1/2. Ftg. 31 shows the results of scaling tne 
chromatogram of Fig. 3H (dividing by an arbitrary scale factor of 3) to obtain a peak height more 
comparable to the peak height of Fig. 3C in order to compare the two curves. To illustrate the sensitivity of 
the method, these results are shown using an expanded scale in Figs. 3C , 3D\ and 3l\ which correspond 
to Figs. 3C. 3D, ana 3i, respectively. 

20 To illustrate further the sensitivity of the method using the house filter, the results of applying the 
method to a 20 pmole PTH standard are shown in Figs. 3J-3M. Fig. 3J shows tne raw chromatogram for the 
sample. Fig. 3K shows the raw chromatogram on an expanded time scale in the time interval between 14.0 
and 16.6 minutes, which clearly reveals a substantial high frequency noise content in the measured 
chromatogram. Fig. 3L shows the results of using the house filter to remove the high frequency noise, with 

25 N = 6. a =1. Rg. 3M shows the results of applying the entire digital filtering method to the raw 
chromatogram of Rg. 3J, and illustrates that the final filtered chromatogram is quite smooth and exhibits 
peaks that are quite well resolved. The subsequent filtering parameters used were N = 6, =20. for the 
resolution enhancement, and N = 3, = 1.2 for the final smoothing. 

30 

PTH Background Correction: 

Processing all of the chromatograms from the sequencing run with the routines described above in 
Rgs. 2A and 2B or 2C and 2D produces a data array containing raw values for all of the PTHs at all of the 

35 cycles. A plot of any individual PTH versus cycle typically shows a rising and/or falling background level of 
the PTH with one or more cycles where the PTH value is substantially higher than this background level. 
This background level can be stripped from the remaining PTH yields by a variation of the recursive least 
squares fit to a polynomial routine used above for low frequency filtering of the chromatograms. In this 
variation, the iterations of the fit algorithm are continued until the ratio of the standard deviation between 

40 actual and fitted data points for successive iterations is above a set value, say S. Once the iterations are 
concluded, the calculated background PTH values are subtracted from the raw values to yield background- 
corrected PTH values. 

Next, an estimate of the dispersion of this background corrected data is made by performing three 
iterations of the least squares polynomial fit routine (with degree of polynomial =1). The standard deviation 

45 of the last iteration then provides an estimate of the variation in the background level, an estimate that 
aJlows assignment of a probability that elevated levels at particular cycles are indeed high by a statistically 
relevant amount. Once the iterations are concluded, the calculated background-corrected PTH values are 
divided by the standard deviation of the background fit to obtain normalized background-corrected PTH 
values. This process is then repeated for each PTH so that the remaining PTH values are also expressed in 

50 units of standard deviation above (or below) the values calculated from the fitted bacKground curves. At this 
point a preliminary seauence assignment is made for each rvde hv pickinn *hp P""H whosp harkrjmurri- 

r ^cec, :: . '.ctratec r '^^<-^ ■*■■ ■ ...... 

^roaram eiemem wnere reautrec constants anc sroaram counters ary mmaiizeu ^ jrcqram counter 
, , is mitiaiizec to tne vaiue Tne counter j is jsea to maex the PTH values tc aenote an ammo acid of 
kind j. An integer M is set equal to the number of cycles diviaed by 12 (rounded :o the nearest integer), 
which corresponds to tne degree of the oolynomial that is used to fit the PTH values. Also, an array c Si ., is 
set eaual to zero for all s=i to 20. In addition, the value of S is set ecuai 'o 2. S being the cutoff va ue o* 
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the ratio cf tne standard deviation between aciua^ and fitted data for successive iterations, m orcer tc 
aetermme wnen the iterations can De terminated ir the least sauares fit routine. At program element 4:5. 
another counter "1" is initialized to zero, the countei ! corresponding to the iteration numDer in tne least 
sauares fit routine. At program element 417, PTH, » (k) is fit with a polynomial Q, » *i(M.k) of degree M. 

5 Initially tnis means ^THt 0 (k) (the totality of values for amino ac;a nurroer 1 as a function of cycle) is fit witn 
a polynomial Gt : {M.k) At program element 419, the standard deviation of the fit for the »-th ammo acid on 
the Rh iteration is caiculatec and is called o,,( (i.e. o !o initially), and a measure o ; tne ft function A \a>,{} is 
calculated. Here A (o, o is defined to be the standard deviation o,, t for points higher than the raw data and 
twice the standard deviation for points Oeiow the raw aata. The ratio of o, i-\ tc o, { is tnen calculated at 

to program element 421 . and, if the ratio is greater than S, tne program proceeas to calculate the bacKground 
corrected normalized PTH amounts at program element 425. If the ratio is less than S. tne counter i is 
incremented at orogram element 422. and at program element 423, a new PTH, t (k> is calcuiatea by 
removing those points {i.e. cycles) where |PTHj i-i fk)-Qj t (k)|> A (oj.t). Here PTH, t (k) = PTH, 0 fk) *or all 
points k common to the domains of botn functions. Once the points are removed, the fitting routine begins 

'C again at program element 417, and the iteration process is continued until tne ratiooj i - 1 - o, t is greater than 
S An iteration counter m is then initialized at 424, and, as incicated earlier, at program element 425. the 
background-corrected raw PTH amounts, PTH'j 0 (k), are calculated for each cycle. This is done by 
subtracting the last fitted value Q,,i(k) for the PTH of amino acid j at cycle k, from the measured PTH 
3mount PTH, 0 (k). At program element 429. the background-corrected raw PTH amount PTH j , 0 (k) is fitted 

z-c vv:tn a fret degree polynomial Z, m * , { k)(Z, n 'k) for the f;rsi iteration}. The standard deviation cf the ni, u, m. 
and a measure of the fit function A (a 1 .m) is then calculated at program element 431. Here A <c,\ m) is 
tyotcaliy chosen to be equal to the standard deviation for points falling above the fitted line and to be equal 
tc 1 67 times the standard deviation for points failing below the fitted lilne. The program then tests the value 
of the iteration counter m at program element 433 in order to stop the iterations at three for this particular 

js sequence of fits. The iteration counter is then incremented at program element 435, and at program 
element 437 a new PTH jm (k) is calculated by removing those points from the background-corrected raw 
values where the difference between PTH j 0 (k) and the fitted value is greater than the measure of the fit 
function This iteration process is then continued twice more, i.e. for a total of three times, and after the 
third time a normalized background-corrected PTH amount PTH L O (k) is calculated at program element 439 

30 by divining the background-corrected raw vaiue by the standard deviation calculated on the last iteration. 
Once the background-corrected normalized PTH amounts are calculated for ammo acid j, the counter j is 
incremented at program element 441 and j ;s tested to see if it is less than or equal to 20 at program 
element 442 If j is less than or equal to 20, the program loops back through the background fit and 
correction process, starting again with program element 415. If j is greater than 20. the method then 

35 proceecs to program element 443 where a preliminary sequence assignment is made by finding the 
maximum normalized PTH amount for each cycle. 

Those skilled in the art will understand that there are many ways of performing the background 
correction For example, different fitting functions may be used, and a measure of dispersion different from 
the standard deviation may be used for fitting. The important aspect of this background-correction process 

40 is to arrive at a normalized sequence of PTH amounts, so that results from cycle to cycle can be 
quantitatively compared. 

Lag Correction: 

-5 

After the normalized background-corrected PTH amounts have been calculated and a preliminary 
sequence assignment has been made, the lag correction is performed. At each cycle, k. of the Eaman 
degradation, the removal of the amino acid resicue is only partial, i.e. a fraction of the amino acid appears 

at subsequent cycles k + 1, k + 2 k + i. At any given cycle, the coupling and/or cleavage failure typically 

so adds 1 to 2% to the out-of-pnase signal, or lag. Since these failures are cumulative, the observed lag 
becomes progressively larger as the sequence proceeds, and in long runs more signal may appear in cycle 
n + 1 than .n cycle n. Hence, the lag correction is particularly important for accurate sequence assignments 
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Y1.3 = 1Y 0 d-b)b : 
Y,.4 = 1Y Q (1-b)b 3 
Similarly, for amino acids 2 and 3: 
y 22 = 1Y 0 (1-b) 2 
Y 2J = 2Y 0 (1-b) 2 b 
Y 2 ,4 = 3Y C (1-b) 2 b 2 
Y 2 . 5 = 4Y 0 (1-b) 2 b 3 
Y 3J = 1Y Q (1-b) 3 
Y 3 . 4 = 3Y D (1-b) 3 b 
Y3.5 = 6Y Q (1-b) 3 b 2 
Y 3 . 6 = 10Y o (1-b) 3 b 3 

If the lag yields are expressed as a ratio to the observed primary cycle yield, these expressions reduce 
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In general terms, if Y k is the primary c/cle yield (i.e. Y Kk ) and Y k _, is the lag yield (i.e. Y kk _ t ) then- 
Y* . 1 Y k = <[k + 0] 1)b 
Y.. a Y k = ([k+1],2)([k + 0]/1)b 2 



Y Jt4$ /Y h = C Ck+2]/3) ([k + l]/2) ([k+0]/l)b 



( [k+i-1] /!)...( [k+2] /3) ( [k+1] /2) ( [k+0] /l) b l 



This expression is not strictly correct because of the assumptions that irreversible signal losses are 
nonexistent and that the failure fraction b is the same at each cycle. However, the former assumption 
introduces a relatively small error as long as the irreversible losses are less than 10% per cycle and 
therefore does not interfere significantly with subsequent calculations. The effect of the latter assumption, 
which is clearly incorrect, is more difficult to evaluate. Empirically, it does not seem to interfere when b is 
measured at each cycle as a cumulative average lag. 

The preferred method of the lag correction is illustrated in Figs. 5A. 5B, and 5C. At program element 
511, the preliminary sequence assignment determined from subroutine 17 is used to define the primary 
cycle yield array Y k (i.e. Y k = MAX(PTHj(k)). This preliminary sequence assignment is then used to 
calculate a working value for cumulative lag, kb, at each cycle. This calculation is set out in program 
elements 513, 515. and 517. First, at program element 513. is set equal to PTHrf k + 1 ) for all Edman 

cycles in the sample, where j is chosen to correspond to the amino acid selected in the preliminary 
sequence assignment for cycle k. A cycle counter "n" is then initialized at program element 515 so that 
each cycle is corrected one at a time, and the working values for the lag coefficients are calculated for each 
remaining cycle (i.e. where k£n) at program element 517 using the formula Y kikM /Y k = kb(k). In program 
element 519, these working values of kb(k) are then fitted to a polynomial curve B(k) of degree 2 = N/15 
(rounded to the nearest integer) using the method of least sauares T hen the measured values o* the tac 



— - ,; ^ LI - ::I>er!L - ■ * - - ■ . <vbr trie ucmam i , ..: a,, rjma-; .;vcies it calculated ai program 
::cr,en: 521 A: piocidin tiefnern 523. an points k ror whicn tne actual value Y k >,,,Y k differs from B(k), the 
fitted value, by more than one standard deviation a p t are removed from the domain N. forming a new 
domain N . A least squares polynomial fit of kb(k) is then performec at program element 525 using the new 
domain. At orogram element 527. the revised fitted lag values B (k) are generatec for all cycles k = 1 to N 
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Then the tenure fraction b(n ; , ts calculated for cycle n at program eiement 529, using tne fitted vaiues ?i E - 
fk), and at program eiemen: 531 b<n ts usee to generate tne fitted lag amounts G(n.i.b) into tne next few 
cycles using the equation G(n,:,b) = ([n i-1 j. i)...([n + 2\ 3)([n - 1 ]-2)([n + 0} 1 )b*(n i or all cycles i until G(n,i,ot- 
<0.01, i.e. until the cycle n + i yield is less than 1% of the cycle n yielc. At program element 533. tne 

5 normalized PTH values are sorted for cycle n to find the three largest values. Yl , Yi , and Y^ ana tneir 
corresponding lags Y nn * M , n+i , Y: „.,. At program element 535, the largest value is tested tc see if it 
ts less than three times as large as the next highest value, and if it is not (i.e. it is greater than q- ecuci to 3 
times the next highest vaiue), the program leaves the ammo acid assignment as it was by setting Y n equal 
to YL at program element 537. If, however, Yl is less than 3 times Yi eacn of the tnree largest values Y- Yt 

70 . Yi are lag corrected in program elements 535 through 557 using the fitted lag based on the original 
sequence assignment to determine if the lag correction would make any changes in sequence assignment. 
In particular, each of the fitted lags Y is calculated as G(n.i.b)Y, at program element 543. and if 

the actual Ya , n -i is greater than zero, the yeild Y p is corrected for the lag in cycle n + i. This correction is 
maoe by adding Y F J to Y^ if Y n j n*r>Y I_! or oy adding Y. >n _, to Y> if Y F ! > Y. n _, at 

15 program element 547. This correction process is continued for the next few cycles i past the cycle number 
n, the criterion for cutoff being that the fitted lag be less than 1% of the actual value as is tested at program 
element 549. This process continues for each cycle i and each vaiue Yx until each Y„ is corrected by 
replacing Y^ with 



(£ _ At program element 553. each of these lag corrections is tested to find the maximum lag corrected value 
and the sequence value Yn is set equal to that maximum, and the amino acid index is determined in order 
to select the proper sequence call for that cycle n. Once the sequence call for cycle n is completed, either 
after program element 537 or after element 553, the fitted fags for that sequence call are calculated at 
program element 559, for the next few cycles past the cycle n, again using the same termination criterion 

io as before, i.e. untii the fitted fag is less than 1% of the actual lag. Cycle n is then corrected for lag from the 
later cycles n + 1 that have positive lag values using the same cutoff criteria as before at program element 
561 by increasing the cycle n yield by the lesser of the calculated lag amount or the actual cycle yield. 
Next cycles n + i are corrected for lag from cycle n by replacing the lag value at cycle n + 1 by the greater of 
zero or the difference between the observed yield Y nn ., and the calculated lag amount Y at 

5 program element 563 for all cycles i where the fitted lag coefficient is less than 1%. 

At this point, cycle n is fixed and cycle n + 1 is free from lag from cycle n that would interfere with the 
sequence assignment of cycle n + f. At program element 555, the amino acid assignments are corrected 
based on the calculated lag corrections and at program element 567 the cycle counter n is incremented to 
calculate the lag for the next cycle. The cycle counter is tested at program element 569 to see if all cycies 

Q have been corrected. If they have not. the method returns to program element 517 to determine the 
empirical lag calculations. The polynomial fit routines 519-525 are repeated based on the new sequence 
assignments and lag coefficients, and the process is continued untii the lag for cycle n+ 1 is corrected and 
the lag from cycle n + 1 is removed from the next few cycles. This procedure continues until all cycles, one 
at a time, are corrected for lag. As the lag corrections are made during each pass through the procedure. 

5 one more cycle is made free of lag interference with its amino acid assignment, until eventually all cycles 
can be assigned independent of lag effects. The prccess then proceeds to the injection correction 
subroutine 23, or the results of the sequence assignment are output directly depending on whether the 
injection correction has been made. Those skilled in the art will understand that there are other approxima- 
tions that can be made in arriving at the effects of the iag correction. For example, instead of using the 

Q highest three vaiues cf the normalized PTH's, one could use just the highest value, or one cculd use the 
two nighest values to see if the sequence assignment changes, and if it does, go back and check for other 
sequence choices. Similarly, one could choose more than tne three highest values if after lag correction, the 
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Infection Correction: 



Once the background and lag corrections have been made, the remaining PTH values at each cycle 
can. if aestred. be used to correct for any variation in the amount of sample injected onto :he PTH analyzer 
at each cycle. For each cycle, ail but the two highest PTH values are averaged. Since the corrected PTH 
values are in standard deviation units, this average would be near zero if tne injection for any given cycle 
were precise. Any nonzero average for a cycle is usee to correct the raw PTH yield data for that cycle by 
subtracting from each raw PTH value the product of the corrected cycle average and that PTH's standard 
deviation unit (calculated in the PTH background correction routine). This procedure, in effect, uses the set 
of nonassigned amino acid values at each cycle as an internal sampling standard. Thus, injection 
corrections can be made in the absence of any added internal HPLC standard. 

Shown in Fig. 6 is a flow chart illustrating the injection correction. First, at program element 611. an 
array {Z M ,} is defined, where the element Z itfn represents the value of the PTH amount of amino acid of kind 
j at Edman cycle n, as calculated from the lag correction subroutine 19. At program element 613. the array 
is sorted by amino acid to determine the two largest values, say Z/.n and Z/'.n for each cycle. The array is 
then averaged at program element 615, for those amino acids other than the two hignest, yielding a column 
array defined as { Z n }. An injection corrected PTH value is then calculated at program element 617 by the 
equa*ior INJ }P = PTH 10 (n)-Z n o it i, where PTH j0 (n) is tne raw PTH value found in subroutine 17 on the last 
iteration 1 'or ammr arid j Finally at program element 610, the array PTH Ji0 (r,} is set ^quai iNJ )[ni to set up 
the name required for the PTH array to be used in subroutine 17. 

Once all :he injection corrections to the raw PTH data set are made, the PTH background and lag 
corrections must be recalculated using this adjusted aata. Once this is done, the subsequent ammo acid 
assignments should be as error free as is possible given the starting chromatograms. 

Utility of the Invention 

Appendix II is a sequence of Tables that illustrate at several steps of the method, results of the series of 
corrections desenbed above. Table 1 shows the raw data resulting from quantitation of 60 cycles of an 
Edman secuencing of an 18 Kiiodalton chain (i.e. through program element 15). Table 2 shows the same 
raw data with the first column, aspartate, background-corrected according to program element 17. Table 3 
shows the results of the next loop through the background correction subroutine 17 in order to correct for 
background in the second column, asparagine. As aescribed earlier, this background correction process is 
repeated until the PTH amounts for each amino acid are background-corrected. Then a preliminary 
sequence assignment can be made. Table 4 shows the results of the background correction. 

The circled elements in Table 4 are the maximum PTH values in each cycle and correspond to a first 
preliminary sequence assignment. 

Using tne preliminary sequence assignment, the lag correction subroutine 19 is then performed which is 
followed by the injection correction subroutine 23. Once the injection correction is completed, the injection- 
corrected sequence of PTH's is then background-corrected at program element 17. The results of the lag 
correction on the background corrected data for cycles 1-37 are shown in Table 5. For comparison, Table 6 
shows the results of the lag correction on the background-corrected data for cycles 1-38. Table 7 shows tne 
results of the lag correction for all 60 cycles. The results of the injection correction for cycle 1 is shown in 
Table 8. Table 9 shows the results of injection correction for cycles 1 and 2. This injection correction 
process is then continued for each cycle until all cycles are lag corrected. The results are snown m Table 
10. Table 11 shows the results of background correction on the injection corrected data of Fig. 10. Fig. 12 
shows the results of lag correction for ail cycles of the background corrected data of Fig. 11. Again, tne 
maximum values in each cycle correspond to the sequence assignment, which is seen to be different from 
the preliminary seauence assignment shown in Table 4, now that tne injection correction has been 
performed. 

This difference in sequence assignment can be seen more clearly in Figs. 7A. 7B. and 7C which show 



LJC£1 - i Hurifr -w,e^, v ,. ^ M.^,., icu -j > 1 pe.iurminy me icy corecuon as seen m r,g. 

7C. The effects of the correction process can also be viewed in terms a particular amino acid, as illustrated 
m Figs. 8A, 8B, and 8C. which show the results of the correction process for glutamic acid by cycle Fig. 8A 
shows the raw data (injection corrected) for glutamic acic by cycle. Fig. 8B shows the results after 
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background correction of the aata of Fig. 8A, and Fig. 8C shows the results after both oacKcrcunc arc iac 
correction. 

Those skilled in the art wii' appreciate that there are many eauivalent ways of implementing tne aDcve 
methoc and different combinations of apparatus can he used to accompi:sn the methcd For example, many 

5 of the computational program elements could be implementea separately from the computer moduie as 
long as the apparatus used for those computations were under appropriate control of the computer module 
In addition it snould be appreciated that the particular program counters ano constants chosen m tne 
preferred embodiment may vary, for example depending on the numbe r of cycles Deng aeqraaed. on the 
desired accuracy of the calculations, and the desired time to complete the sequence call of the peptide. For 

ic these and other reasons, it is intended that the scope of the invention be interpreted with reference to the 
appenaed claims and equivalents thereto and not be limited to tne specific example chosen to describe it 



Claims 

1 . A method of obtaining a baseline corrected chromatogram using a discrete. linear, translation 
invariant filter function a* , where N is a measure of the filter function width and a is a parameter wncse 
value determines signal to noise characteristics of the filter function, comprising the steps of: 
performing a chromatographic analysis of a sample to obtain a first chromatogram; 
?n filtering said first chromatogram with the filter function, with N set to approximate the width u\ jtiaks 

obtained in said first chromatogram, and a set to filter out high frequency noise from said first chromatog- 
ram to obtain a second chromatogram having a first filtered baseline: 

filtering said second chromatogram with the filter function, with N set to approximate the width of oeaks 
obtained tn said first chromatogram and a set to resolution enhance peaks in said second chromatogram to 
25 obtain a third chromatogram having a baseline which is substantially the same as said first filtered baseline: 
and 

subtracting said second chromatogram from said third chromatogram to obtain a *ourth chromatogram 
■Ahich ts baseline corrected. 

? The method of claim 1 further comprising: 
30 truncating said fourth chromatogram for values below a preselected threshhoid to obtain a fifth 

enromatogram: 

filtering said fifth chromatogram with the filter function with a set to remove high frequency noise and N 
set equal to tne width of peaks in said fifth chromatogram to obtain a sixth chromatogram; 
detecting peaks in said sixth chromatogram; and 
35 measuring the height of said peaks in said sixth chromatogram. 

3. The method of claim 2 wherein said filter function is a shifted triangular filter function proportional to 

(N + l-/n( ) 2c< - 2U(U + 1 ) -N 

N(N+1; N(2N+1) 



4. T he method of claim 2 wherein peaks detected m said sixth chromatogram are compared with peaks 
in a reference chromatogram to identify the sample. 

5. The method of claim 3 wherein said sample comprises a mixture of amino acid derivatives. 

6. The method of claim 4 wherein said filter function is a shifted triangular filter function proportional to 

(N + l-JnJ ) 2 - 2c<:(N+l ) -N 

N{N+1) N(2N+1) 

T An aDDaratus ; or obtamino a baseline corrected chromatoaram of p. cannpio hqv^ ^ plurality 



- •■ a, v. ' • «ar.L :l: .... " :. e r ::_ Sigriai "ere.naite canec a raw 

c nromatogram, . 

computer means coupled to said detection means for analyzing said raw chromatcgram. said computer 
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means comprising: 

filter means for filtering said raw chromatogram with a first filter function a~ that is a aiscrete. linear, 
translation invariant function, where N is a measure of the filter function width and q is a parameter wnose 
vaiue determines signai to noise charachteristics of the filter function; 

control means for applying said filter means to said raw chromaiogram with N set to approximate the 
width of peaks obtained in said raw chromatogram, and a set to filter out high frequency noise from said 
raw chromatogram to obtain a second chromatogram having a first filtered baseline; 

said control means also including means for applying said filter means to said second chromatogram 
with N set to approximate the width of peaks obtained in said first chromatogram and a set to resolution 
enhance peaks in said second chromatogram to obtian a third chromatogram having a baseline which is 
substantially the same as said first filtered baseline; 

said control means also including means for subtracting said secono chromatogram from said third 
chromatogram to obtain a fourth chromatogram which is baseline corrected. 

8. A method for synthesising a peptide in accordance with a predetermined amino acid sequence, 
characterised in that the sequence has been determined by a method according to any one of claims i to 
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1A 



1 



Perform Edman degradation for cycle k 
of peptide to be sequenced 



i 



Perform chromatogram 
Stor3 results 



Increment k until all cycles completed 




1 5 



Quantitate chromatoaram 
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Fig. IB 
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53 



Model 470A 



Model 120A 
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A 



Fig. 2A 




/^tart\ 



21 1 



Initialize constants and counters 



Fit portion i of measured chromatogram 
C 0 k (t) with polynomial of degree N 
( ie. fit C (t) with a polynomial 



k I 



. m (t,N), where C m ki (t) is the value of 
Qf the reduced chroma* og rsm for the i-th 
portion on the m-th iteration of the 
k-th Edman cycle) 



Calculate af m (C), S{c* m ) 



21 5 



217 



Increment m 



i 



21 9 



221 



Calculate C Jf ' (t) by 
removing all points 
from cj^'., (t) where 
|C m-i (t)-P m k ' (t,N)|>6(of m ) 
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Fig. 2B 




40 
T 

X 



227 



Calculate baseline b k (t) = { P j 4 ' (t), l=max m for each i } 
tor each point in C £ (t) 



Calculate low-frequency, background-corrected 
chromatogram Co (t) = Co (t) - b k (t) 



229 



V 



Perform fast fourier iransfcrm on C 0 (t) to 
remove high frequency noise. 



231 



Detect and cuantitate peaks in filtered 
chromatogram to obtain PTH j j0 (k) 



233 



232 



If k ± number of amino acids in peptide 



N 



° | 



Yes 



-£> 



Go to 17 



235 



Increment k 
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Fig. 2C 





A 



238 



Initialize constants and counters 



262 



240 



Set a =1 
Set N to match chromatogram 
Peak widths (= N1) 



242 



Calculate filtered uiuomaiogram: 
C * (t) = AC Q (t) = Z m . N a * (n) Co (t-n) 



Where a ° = (N + 1-|n| ) (2 «/, (N+i } ) - (2a ( N+ l) - n)/n(2N + 1) 



244 



7 



Set a = 20 



A 



246 



Calculate C £ (t) = X ^ a° (n) C*, (t-n) 



Define C £ (t) = C \ 2 (t) - C * (t) 



248 



250 



Chop off points below low threshold to yield C £ (t) 



i 



Set a = 1/2 N = N1/2 



I 



252 



254 



Calculate C % (t) = A° C„ 
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Fig. 2D 




Detect and quantitate peaks 



Compare peaks with known 
reference peaks to identify 
amino acids in sample 
corresponding to said reference 
peaks, to obtain PTH j i0 (k) 
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Fig. 4A 



Start) 
7 



41 1 



Initialize constants and counters 
Set S =2; Set M; Set j = 1; Set c Sj = 0 S 



(400) 



Set I = 0 



415 



417 



Fit PTH j,i (k) with polynomial Q (M,k) 
of degree M, where l-s-1 is the iteration number 



419 




423 



Calculate PTH ^ (k) by removing those points (ie. cycles) 
where | PTH J>M (k) - Q ^(k) | > a ( a ]f , ) Comment: 
PTH jj (k) = PTH j jM (k) for points k common to both domains, 
but the domain of PTH j,i (k) is smaller than that of PTH j iM (k) 



425 



Set m=0 



424 



For all cycles k, calculate background - corrected raw PTH 
amounts. PTH' J>0 (k) h PTH J>0 (k) - Q J , (k) 
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Fig. 4B 



410, 



429 



Fit PTH' Jjm (k) with polynomial Z l 1 (k) of cearee 1. 

where m-i-1 denotes the iteration number 



Calculate a' J>m , A'(a\ m ) 




Calculate PTH' jjm (k) by removing those points (ie. cycles) 
where | PTH J>m+1 (k) - Z j m (k) | > a' ( m ) Comment: 
PTH' j >m (k) = PTH' i im .-i (k) for points common 
to both domains 



For alt k cycles, calculate normalized background - corrected 
PTH amounts PTH_],o( k ) = PTH' Jj0 (k) / c ' J 3 



443 




Select preliminary sequence: Find max PTH ] 0 (k) for each cycle k 
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Fig. 5A (Lag correction) ^- 311 



Sort PTH j (k) 


and set 


Y k = max PTH j (k) 


for a! 


1 cycles 


k = 1 to N j 


T 



Set Y k k+1 = PTH i (k+1) for all k = 1 to N, where j 
j corresponds to amino acid number for which j 
PTH j (k) is a maximum in cycle k j 
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515 




Initiaiize cycle counter: n = 1 [ 


'V 




i 


f 517 


Calculate 


lag coefficients Y k k+1 / Y k = 
for k = n to N 


k b (k) 



513 



Fit q b(q) with polynomial B(q) of degree Z 
for q = 1 to N 



Calculate g b , the standard deviation of B (q) 

from the actual measurements 

i 

Y q , q+ i / Y q over the domain N 
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ig. 53 



523- 




ihrow out all points q such that I Y , / Y 



B (q) 



to form a new domain N* (Set new lag values 
B(q) = 0 for those q not in N') 



525 

\ 



Fit q b(q) with polynomial B'(q) of degree Z 
where q is an element of the domain N' 



i 



Generate revised, fitted lag values: 

SetY k,k + i 1 \ = B '(k) forallk = 1ioN 



527- 



JL 



Calculate: 



b(n) = B'(n) / n 



529 



i 

JL 



Calculate fitted lags 

Y n,n + i /Y n = G(n,i,b) 
for all cycles i until Y n >n+I / Y n < 0.01 
where G (n,i,b) s ((n+i-1) / i)...([n+2] / 3) ([n+1] / 2) ([n+0] / 1) 



i r 

531 
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rig. 5C 



5CA ! 



/-533 



Sort PTH j (n) to find 3 highest values, and define 3 
highest values in the order of their magnitudes 
as Y„, Y n 2 , Y„ , and corresponding lags 

1 2 3 

' n,n+l j Y njn+ | , Y n .n+t for i = 1 to N-n-1 



537 




Cot v - V 
- l 1 n — 1 n 



541 



539 2\ i__ Yes Set Y n S = Y n 

Do j = 1 to 3~ U 



Do for i = 1, to N-n 



543 



Calculate Y ^ = G (n,i,b)Y^ s 




545 



547 



Set Y I' = Y ] n s + min 



f Y n,n+l 
\ Y FJ 



n,n+i 




551 



Increment j if j < 3, otherwise exit loop 



± 



Sort Y by magnitude; Find max Y Jf; Find 
amino acid index for max Y Js , set Y =Y 3 

n ' n n 
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Fig. 5D 



Calculate fitted lags for cycle n: 

< n+1 = G (n,i ; b) Y n 
for ail i until Y n F n+i < 0.01 



559 



Correct cycle n for lag into later cycles 



Y„ +L min f Yn c - n+i 



Y ' 
n,n+i 



for all cycles i where Y n n+i > 0 
until Y n F n+l / y n < 0.01 



Correct cycle n+i for lag from cycle n: 

r 0 

Y n , n+ i => rnax ^ v ^ F 



Y . - v 

n » n +' T n,n + I 



for all cycles i until Y n n+| / Y n < 0.01 



Correct amino acid assignment 
based on calculated lag correction 
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Increment cycle counter n 
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Is n > N 



Yes 



Continue 



No 
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Fig. 6 (Injection correction) 
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Define array { Z ln } , where the element Z J n is 
the value of the PTH amount for amino acid of 
kind j at cycle n calculated from the lag correction 
subroutine 19 



613 




Calculate average of Z ] n ever all j except j \ j 
for each n. Define result as { Z n } 



615 



Calculate injection corrected raw PTH value, iNJ j (n) : 

INJ ]in = PTH Jf0 (n)-Z n ajf| 
where PTH j 0 (n) is the raw PTH value found in 
subroutine 15 for amino acid j in cycle n, and 
Cj,i is the standard deviation calculated in subroutine 
17 on the last iteration of I. 



SetPTH jf0 (n) = INJ Jtn 



617 
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